\subsection{Regular derivations}
\begin{definition}
	A rule of the form \( A \to a \) is called a \emph{terminal rule}.
	A rule of the form \( A \to aB \) is called a \emph{nonterminal rule}.
\end{definition}
\begin{lemma}
	Let \( G \) be a regular grammar.
	If \( S \xrightarrow G \alpha \), then \( \alpha \in \mathbb W \cup \mathbb W V \).
\end{lemma}
\begin{proof}
	This is shown by induction on the length of the derivation.
	The length-zero derivation gives \( \alpha = S = \varepsilon S \in \mathbb W V \).
	Suppose \( S \xrightarrow G \beta \xrightarrow G_1 \alpha \) where \( \beta \in \mathbb W \cup \mathbb W V \).
	If \( \beta \in \mathbb W \), \( \beta \) contains no variables, but all rules rewrite a variable.
	This contradicts that \( \beta \xrightarrow G_1 \alpha \).
	So suppose \( \beta = wA \) for \( w \in \mathbb W \), \( A \in V \).
	Then the rule must be of the form \( A \to a \) or \( A \to aB \).
	Hence \( \beta = wa \) or \( \beta = waB \).
	In either case, the required invariant holds.
\end{proof}
\begin{lemma}
	If \( S \xrightarrow G w \) for \( w \in \mathbb W \), then the derivation has length \( \abs{w} \) and consists of precisely \( \abs{w}-1 \) nonterminal rules and one final terminal rule.
\end{lemma}
\begin{proof}
	Terminal rules preserve the length of a string, and decrement the amount of variables.
	Nonterminal rules increment the length of a string, and preserve the amount of variables.
	Given that \( S \) is a string of length one with one variable, we must apply \( \abs{w}-1 \) nonterminal rules to increment the length of the string \( \abs{w}-1 \) times.
	By the previous lemma, the number of variables in each derived string is always 0 or 1.
	If the number ever reaches zero, nothing can be rewritten.
	Given \( w \in \mathbb W \), the number must reach zero, so a single terminal rule must be applied at the end.
\end{proof}
Note that the derivation is not uniquely determined.
\begin{lemma}
	Regular languages are closed under concatenation.
\end{lemma}
\begin{proof}
	Let \( G = (\Sigma, V, P, S), G' = (\Sigma, V', P', S') \), where without loss of generality \( V \cap V' = \varnothing \).
	Let \( P^\star \) be the set of production rules given by \( P \), but for each terminal rule \( A \to a \) in \( P \), replace it with a nonterminal rule \( A \to aS' \).
	Then let \( H = (\Sigma, V \cup V', P^\star \cup P', S) \).
	We claim \( \mathcal L(H) = \mathcal L(G)\mathcal L(G') \).

	Suppose \( S \xrightarrow G v \) and \( S' \xrightarrow {G'} w \).
	Then \( S \xrightarrow H vS' \), and so \( S \xrightarrow H vw \) as required.

	Conversely, suppose \( S \xrightarrow u \) for \( u \in \mathbb W \).
	By the above lemma, the derivation is of the form
	\[ S = \sigma_0 \xrightarrow H_1 \dots \xrightarrow H_1 \sigma_n = u \]
	where \( \sigma_i = w_i X_i \) for some \( w_i \in \mathbb W, X_i \in V \).
	An initial segment of the \( X_i \) belongs to \( V \), until rewritten as \( S' \) by a rule added into \( P^\star \).
	Then, the rest of the \( X_i \) belong to \( V' \), because only the new rules in \( P^\star \) map variables between \( V \) and \( V' \).
	Hence the derivation splits into two halves, \( u = vw \) where \( S \xrightarrow G v, S' \xrightarrow {G'} w \), giving the concatenation as required.
\end{proof}

\subsection{Deterministic automata}
\begin{definition}
	Let \( \Sigma \) be an alphabet.
	Then a \emph{deterministic automaton} is a tuple of the form \( D = (\Sigma, Q, \delta, q_0, F) \) where \( Q \) is a finite set of \emph{states}, \( q_0 \in Q \) is the \emph{start state}, \( F \subseteq Q \setminus \qty{q_0} \) is the \emph{accept states}, and \( \delta \colon Q \times \Sigma \to Q \) is the \emph{transition function}.
\end{definition}
We graphically represent deterministic automata using labelled directed graphs.
The nodes are elements of \( Q \), circled twice for accept states and circled once for other states.
Each node has \( \abs{\Sigma} \)-many outgoing arrows labelled with the corresponding letter.

\begin{center}
	\begin{tikzpicture}[shorten >=1pt, node distance=2cm, on grid, auto]
		\node[state, initial] (q_0) {\( q_0 \)};
		\node[state, accepting] (q_1) [above right=of q_0] {\( q_1 \)};
		\node[state] (q_2) [right=of q_0] {\( q_2 \)};

		\path[->]
			(q_0) edge node {0} (q_1)
			(q_0) edge node [swap] {1} (q_2)
			(q_2) edge node [swap] {0} (q_1)
			(q_2) edge [loop below] node {1} ()
			(q_1) edge [loop above] node {0, 1} ()
		;
	\end{tikzpicture}
\end{center}

We intuitively interpret a deterministic automaton as a machine that starts at \( q_0 \) and reads a word \( w \in \mathbb W \) symbol-by-symbol, transitioning to a new state according to \( \delta \) at each step.
After all symbols in the word are parsed, we check whether the machine lies in an accept state or not.
We say the automaton \emph{accepts} \( w \) if the final state is an accept state; otherwise, it \emph{rejects} \( w \).
\begin{definition}
	We define by recursion a function \( \hat\delta \colon Q \times \mathbb W \to Q \) by \( \hat\delta(q,\varepsilon) = q \) and \( \hat\delta(q,aw) = \hat\delta(\delta(q,a),w) \).
	The \emph{language accepted by \( D \)} is
	\[ \mathcal L(D) = \qty{w \mid \hat \delta(q_0, w) \in F} \]
	The sequence of states produced from \( q_0 \) and reading \( w \) is uniquely determined and of length \( \abs{w} + 1 \), known as the \emph{state sequence} of the computation.
\end{definition}
We claim that in the example above, \( \mathcal L(D) = \qty{w \mid w \text{ contains at least one } 0} \).
Note that \( \hat\delta(q_0,w) = q_0 \) if and only if \( w = \varepsilon \).
There are three transitions in the diagram for the letter 0, but all such 0-transitions lead to \( q_1 \) hence every string with a zero goes to \( q_1 \).
All transitions from \( q_1 \) go back to \( q_1 \), so any string containing a zero must end at \( q_1 \).
All other strings are of the form \( 1111\dots 1 \), which end at \( q_2 \).
\begin{definition}
	Let \( D = (\Sigma, Q, \delta, q_0, F), D' = (\Sigma, Q', \delta', q_0', F') \) be deterministic automata.
	Then a map \( f \colon Q \to Q' \) is called a \emph{homomorphism} from \( D \) to \( D' \) if
	\begin{enumerate}
		\item for all \( q \in Q \) and \( a \in \Sigma \), we have \( \delta'(f(q),a) = f(\delta(q,a)) \);
		\item \( f(q_0) = q_0' \);
		\item \( q \in F \) if and only if \( f(q) \in F' \).
	\end{enumerate}
\end{definition}
In particular, if \( f \) is bijective, it has an inverse and is called an \emph{isomorphism}.
We can show by induction that \( \hat \delta'(f(q),w) = f(\hat\delta(q,w)) \).
Note that if a homomorphism \( f \) is not surjective, the states not in its range are not \emph{accessible} from \( q_0' \).
If \( f \) is not injective, so \( f(p) = f(q) \) for \( p \neq q \), then we have \( f(\hat\delta(p,w)) = \hat\delta'(f(p),w) = \hat\delta'(f(q),w) = f(\hat\delta(q,w)) \); we will say that such states \( p, q \) are \emph{indistinguishable}.
We will observe that failure to be bijective only affects inaccessible states or pairs of indistinguishable states.
\begin{proposition}
	Let \( f \) be a homomorphism (not \emph{a priori} an isomorphism) from \( D \) to \( D' \).
	Then \( \mathcal L(D) = \mathcal L(D') \).
\end{proposition}
\begin{proof}
	Let \( w \in \mathcal L(D) \), so \( \hat\delta(q_0,w) \in F \).
	Applying \( f \), \( f(\hat\delta(q_0,w)) = \hat\delta'(f(q_0),w) = \hat\delta'(q_0',w) \in F' \) as required.
	All implications are bi-implications, so the converse holds.
\end{proof}
\begin{theorem}
	Any language of the form \( \mathcal L(D) \) for a deterministic automaton \( D \) is regular.
\end{theorem}
\begin{proof}
	Let \( D = (\Sigma, Q, \delta, q_0, F) \), and define a grammar \( G = (\Sigma, V, P, S) \) by \( V = Q, S = q_0 \), and
	\[ P = \qty{p \to aq \mid \delta(p, a) = q} \cup \qty{p \to a \mid \delta(p,a) \in F} \]
	We will show \( \mathcal L(D) = \mathcal L(G) \).
	Suppose \( w = a_0\dots a_n \in \mathcal L(D) \).
	Then \( \hat \delta(q_0,w) \in F \), so there exist \( q_0, \dots, q_{n+1} \) such that \( q_{i+1} = \delta(q_i, a_i) \), and \( q_{n+1} \in F \).
	By definition of \( G \), this holds if and only if there exist \( q_0, \dots, q_{n+1} \) such that \( q_i \to a_i q_{i+1} \in P \) and \( q_n \to a_n \in P \).
	This holds if and only if there exists \( q_0, \dots, q_{n+1} \) such that \( q_0 \xrightarrow G_1 a_0 q_1 \xrightarrow G_1 \dots \xrightarrow G_1 a_0 \dots a_{n-1} q_n \xrightarrow G_1 w \), so there exists a derivation \( w \in \mathcal L(G) \).
	By regularity of \( G \), all derivations are of this form, so we have bi-implications, and \( \mathcal L(D) = \mathcal L(G) \).
\end{proof}
We will show that if \( L \) is a regular language, we can find a deterministic automaton \( D \) such that \( L = \mathcal L(D) \).
However, regular grammars can have multiple rules that may be used when reaching a single symbol, for instance \( p \to aq \) and \( p \to aq' \), so we cannot perform an obvious translation from this grammar into a deterministic automaton.
To encapsulate this notion, we introduce nondeterministic automata.

\subsection{Nondeterministic automata}
\begin{definition}
	A \emph{nondeterministic automaton} is a tuple of the form \( N = (\Sigma, Q, \delta, q_0, F) \) where \( Q \) is a finite set of states, \( q_0 \in Q \), \( F \subseteq Q \setminus \qty{q_0} \), but in contrast with deterministic automata, we have \( \delta \colon Q \times \Sigma \to \mathcal P(Q) \).
\end{definition}
We interpret \( \delta(q,a) \) as the set of possible states that the machine can transition into when reading \( a \) from state \( q \).
The graphical representation of such an automaton is the same.

\begin{center}
	\begin{tikzpicture}[shorten >=1pt, node distance=2cm, on grid, auto]
		\node[state, initial] (q_0) {\( q_0 \)};
		\node[state, accepting] (q_1) [right of=q_0] {\( q_1 \)};
		\node[state] (q_2) [below=of q_1] {\( q_2 \)};

		\path[->]
			(q_0) edge node {\( a \)} (q_1)
			(q_0) edge node [swap] {\( a \)} (q_2)
			(q_1) edge [loop right] node {\( a \)} ()
			(q_1) edge node {\( a \)} (q_2)
			(q_2) edge [loop right] node {\( b \)} ()
		;
	\end{tikzpicture}
\end{center}
Here, we define \( \hat \delta \colon Q \times \mathbb W \to \mathcal P(Q) \), by the equations
\[ \hat \delta(q, \varepsilon) = \qty{q};\quad \hat \delta(q, wa) = \bigcup_{p \in \hat\delta(q,w)} \delta(p,a) \]
This produces a unique \emph{state set sequence}, not a deterministic state sequence.
We define
\[ \mathcal L(N) = \qty{w \mid \hat\delta(q_0,w) \cap F \neq \varnothing} \]
\begin{remark}
	Deterministic automata can be seen as a special case of nondeterministic automata.
\end{remark}
\begin{theorem}
	Let \( N \) be a nondeterministic automaton.
	Then there exists a deterministic automaton \( D \) such that \( \mathcal L(N) = \mathcal L(D) \).
\end{theorem}
Our proof will involve a \emph{subset construction}.
\begin{proof}[Proof]
	Let \( N = (\Sigma, Q, \delta, q_0, F) \).
	We define \( D = (\Sigma, \mathcal P(Q), \Delta, \qty{q_0}, G) \), where
	\[ \Delta(X, a) = \bigcup_{q \in X} \delta(q,a);\quad G = \qty{X \in \mathcal P(Q) \mid X \cap F \neq \varnothing} \]
	We show that these two automata produce the same language.
	Consider the state sequence of \( D \) on input \( w \).
	\[ X_0 = \qty{q_0};\quad X_{i+1} = \bigcup_{q \in X_i} \delta(q,a_i) \]
	The state set sequence of \( N \) on input \( w \) is
	\[ Y_0 = \qty{q_0};\quad Y_{i+1} = \bigcup_{q \in Y_i} \delta(q,a_i) \]
	Clearly, these exactly match, so \( X_i = Y_i \).
	So \( w \) is accepted by \( D \) if and only if it is accepted by \( N \).
\end{proof}
\begin{remark}
	Although this construction always works, we have transformed an automaton on \( n \) states into one on \( 2^n \) states.
\end{remark}
\begin{theorem}
	Let \( G \) be a regular grammar.
	Then there exists a nondeterministic automaton \( N \) such that \( \mathcal L(G) = \mathcal L(N) \).
\end{theorem}
\begin{proof}
	Let \( G = (\Sigma, V, P, S) \).
	Let \( H \not\in \Sigma \cup V \) be a new symbol, known as the \emph{halt state}.
	Let \( Q = V \cup \qty{H} \).
	Define \( N = (\Sigma, Q, \delta, S, \qty{H}) \) where
	\[ \delta(A,a) = \begin{cases}
		\qty{B \mid A \to aB \in P} & \text{if } A \to a \not\in P \\
		\qty{B \mid A \to aB \in P} \cup \qty{H} & \text{if } A \to a \in P
	\end{cases} \]
	We claim that \( \mathcal L(G) = \mathcal L(N) \).
	If \( w \in L(G) \), we have a sequence \( A_0, \dots, A_{n+1} \) of variables such that
	\[ S = A_0 \xrightarrow G_1 \dots \xrightarrow G_1 a_0\dots a_{n+1} A_{n+1} \xrightarrow G_1 w \]
	In particular, \( A_i \to a_i A_{i+1} \in P \) and \( A_{n+1} \to a_n \in P \).
	By definition of \( \delta \), there exists a sequence \( A_1, \dots, A_{n+1} \) such that \( A_{i+1} \in \delta(A_i, a_i) \) and \( H \in \delta(A_n, a_n) \).
	Hence \( H \in \hat\delta(S,w) \), so \( w \in \mathcal L(N) \).
	All implications are bi-implications so the converse holds.
\end{proof}

\subsection{The pumping lemma for regular languages}
\begin{definition}
	A language \( L \) \emph{satisfies the regular pumping lemma with pumping number \( n \)} if every word \( w \in L \) with length at least \( n \) can be split into three parts \( w = x y z \), such that \( \abs{y} > 0 \), \( \abs{xy} \leq n \) and for all \( k \in \mathbb N \), we have \( xy^kz \in L \).
	We call \( y \) a \emph{pump} for the word \( x y z \).
\end{definition}
\begin{theorem}[regular pumping lemma]
	Every regular language satisfies the pumping lemma.
\end{theorem}
\begin{remark}
	If any word can be pumped, the language must be infinite.
\end{remark}
\begin{proof}
	Let \( L \) be a regular language.
	Then there exists a deterministic automaton \( D \) such that \( L = \mathcal L(D) \).
	We show that \( L \) has pumping number \( n = \abs{Q} \).
	Let \( w \in L(D) \) be a word with \( \abs{w} \geq n \).
	We can write \( w = a_0a_1 \dots a_{n-1} v \) where \( v \in \mathbb W \).

	The state sequence of \( D \) reading \( a_0, \dots, a_{n-1} \) is \( q_0, \dots, q_n \); it has length \( n + 1 \) since there are \( n \) state transitions.
	But there are only \( n \) states, so by the pigeonhole principle, one state must repeat.
	Let \( i < j \leq n \) such that \( q_i = q_j \).
	Let \( x = a_0 \dots a_{i-1}, y = a_i \dots a_{j-1}, z = a_j \dots a_{n-1} v \), so we have \( xyz = w \), \( \abs{y} > 0 \), \( \abs{xy} \leq n \) by construction.

	We show that we can pump the word.
	After reading \( x \), we have \( \hat\delta(q_0, x) = q_i \), and \( \hat\delta(q_i, y) = q_j = q_i \), and finally \( \hat\delta(q_i, z) = \hat\delta(q_j, z) \in F \).
	Hence, \( \hat\delta(q_0, xy^k) = q_i \) by induction on \( k \).
	In particular, \( \hat\delta(q_0, xy^kz) \in F \) as required.
\end{proof}
\begin{example}
	Let \( L = \qty{0^k 1^k, k > 0} \).
	We claim this is not a regular language.
	Suppose \( L \) is regular, and has pumping number \( N \).
	Consider the word \( 0^N 1^N \in L \); this word has more than \( N \) letters, so the word can be pumped.
	The pump must lie in the first \( N \) letters, all of which are zeroes.
	Pumping down, \( 0^{N-\ell} 1^N \in L \) where \( \ell \) is the length of the pump.
	This is a contradiction since the length of the pump is nonzero.
	Note that this language is context-free, so we know that the inclusion of regular languages in context-free languages is proper.
\end{example}
\begin{example}
	Let \( n > 0 \), and let \( L = \qty{0^n w, w \in \mathbb W} \).
	We show this is regular, but any deterministic automaton \( D \) such that \( L = \mathcal L(D) \) has more than \( n \) states.
	For regularity, we can simply write down a grammar.
	\begin{align*}
		P &= \{S \to 0X_0, X_0 \to 0X_1, \dots, X_{n-2} \to 0X_{n-1}, X_{n-2} \to 0, \\
		&X_{n-1} \to 0, X_{n-1} \to 1, X_{n-1} \to 0X_{n-1}, X_{n-1} \to 1X_{n-1}\}
	\end{align*}
	This has exactly \( n + 1 \) states.
	Suppose that an automaton with at most \( n \) states has the same language.
	Then \( L \) satisfies the pumping lemma with pumping number \( n \).
	In particular, we can pump down the word \( 0^n \), obtaining a word with fewer zeroes, and this is not in the language.
\end{example}
\begin{example}
	Some non-regular languages also satisfy the pumping lemma.
	Let \( \Sigma = \qty{0,1} \).
	If a word \( w \in \mathbb W \) contains at least one zero, we say the \emph{tail} of the word is the number of ones that follow the last zero.
	Let \( X \subseteq \mathbb N \) be an arbitrary set of naturals, and define \( L_X \) to be the set of words that contain no zeroes, or have a tail which lies in \( X \).
	If \( X \neq Y \), we have \( L_X \neq L_Y \), so \( L \) is an injection from \( \mathcal P(\mathbb N) \) to the space of languages on \( \Sigma \).
	Since there are uncountably many \( X \subseteq \mathbb N \), but there are only countably many regular languages, there must be some non-regular languages of the form \( L_X \).

	We claim that all \( L_X \) satisfy the pumping lemma, so then there must be some \( L_X \) which are non-regular which satisfy the pumping lemma.
	Let \( X \subseteq \mathbb N \); we claim this has pumping number 2.
	Let \( w \in L_x \) such that \( \abs{w} \geq 2 \).

	Suppose \( w \) starts with a zero, so \( w = 0z \).
	Then let \( x = \varepsilon, y = 0 \), so \( w = xyz \).
	Pumping up does not change the tail; pumping down either does not change the tail or there are now no zeroes, but in either case, the new word lies in the language.

	Conversely, suppose \( w \) starts with a one, so \( w = 1z \).
	Let \( x = \varepsilon, y = 1 \), so \( w = xyz \) as before.
	If \( z \) contains no zeroes, after pumping \( y \), there are still no zeroes, so the new word is in the language.
	If \( z \) contains a zero, there is a tail, and pumping \( y \) does not influence the tail.
	Hence, the pumping lemma is satisfied.
\end{example}

\subsection{Closure properties}
We have already shown that regular languages are closed under concatenation and union.
We will now show that they are closed under complement, intersection, and difference.
For this, it suffices to show they are closed under complement, because intersection and difference can be expressed in terms of complement and union.

Consider an automaton \( D = (\Sigma, Q, \delta, q_0, F) \).
Without loss of generality, we can ensure that \( \delta(q_i,a) \neq q_0 \) for all \( i, a \).
Now define \( D' = (\Sigma, Q, \delta, q_0, \mathbb Q \setminus (F \cup \qty{q_0})) \).
Then, \( \mathcal L(D') = \mathbb W^+ \setminus \mathcal L(D) \).

There is an alternative construction to obtain union and intersection, known as the \emph{product automaton} construction.
Let \( D = (\Sigma, Q, \delta, q_0, F) \) and \( D' = (\Sigma, Q', \delta', q_0', F') \).
We can define the pointwise product \( D'' = (\Sigma, Q \times Q', \delta \times \delta', (q_0, q_0'), F'') \), where \( (\delta \times \delta')((q,q'),a) = (\delta(q,a), \delta'(q',a)) \), and either \( F'' = \qty{(q,q') \mid q \in F, q' \in F'} \) or \( F'' = \qty{(q,q') \mid q \in F \text{ or } q' \in F'} \).
We can see that the language generated by this new automaton is \( \mathcal L(D) \cap \mathcal L(D') \) or \( \mathcal L(D) \cup \mathcal L(D') \).

\subsection{Emptiness problem}
\begin{lemma}
	Let \( L \) be a nonempty regular language with pumping number \( n \).
	Then there is a word \( w \in L \) such that \( \abs{w} < n \).
\end{lemma}
\begin{proof}
	Let \( w \) be a word in \( L \).
	If \( \abs{w} < n \), we are already done.
	Otherwise, it can be pumped down into a smaller word.
	By induction, we can obtain a word of length less than \( n \).
\end{proof}
\begin{corollary}
	The emptiness problem for regular grammars is solvable.
\end{corollary}
\begin{proof}
	Given a regular grammar, we can obtain its pumping number.
	We can check every word below this length because the word problem is solvable; if no words are accepted, the language is empty.
\end{proof}

\subsection{Regular expressions}
\begin{definition}
	The \emph{Kleene star} operation on a language \( L \), written \( L^\star \), is given by
	\[ L^\star = \qty{w \mid \exists \text{ sequence of words in } L, w = \text{their concatenation}} \]
	In particular \( \varepsilon \in L^\star \).
	The \emph{Kleene plus} operation is \( L^+ = L^\star \setminus \qty{\varepsilon} \).
\end{definition}
\begin{definition}
	A \emph{regular expression} on an alphabet \( \Sigma \) is defined inductively by:
	\begin{enumerate}
		\item the symbol \( \varnothing \) is a regular expression;
		\item \( \varepsilon \) is a regular expression;
		\item for all \( a \) in \( \Sigma \), \( a \) is a regular expression;
		\item if \( R, S \) are regular expressions, \( (R + S) \) is a regular expression;
		\item if \( R, S \) are regular expressions, \( (R S) \) is a regular expression;
		\item if \( R \) is a regular expression, \( R^\star \) is a regular expression;
		\item if \( R \) is a regular expression, \( R^+ \) is a regular expression.
	\end{enumerate}
	By definition, nothing else is a regular expression.
	By recursion, we can assign a language \( \mathcal L(E) \) to each regular expression \( E \).
	\begin{enumerate}
		\item \( \mathcal L(\varnothing) = \varnothing \);
		\item \( \mathcal L(\varepsilon) = \qty{\varepsilon} \);
		\item for \( a \in \Sigma \), \( \mathcal L(a) = \qty{a} \);
		\item if \( R, S \) are regular expressions, \( \mathcal L(R + S) = \mathcal L(R) \cup \mathcal L(S) \);
		\item if \( R, S \) are regular expressions, \( \mathcal L(R S) = \mathcal L(R) \mathcal L(S) \);
		\item if \( R \) is a regular expression, \( \mathcal L(R^\star) = \mathcal L(R)^\star \);
		\item if \( R \) is a regular expression, \( \mathcal L(R^+) = \mathcal L(R)^+ \).
	\end{enumerate}
\end{definition}
Note that rules (iv) and (v) introduce parentheses, occasionally unnecessarily.
When the meaning is unambiguous, these parentheses are omitted.
The binding power of concatenation \( RS \) is higher than union \( R + S \), so we can write \( RS + T \) for \( ((RS) + T) \).

We say that a language is \emph{essentially regular} if there is a regular language \( L' \) such that \( L = L' \) or \( L = L' \cup \qty{\varepsilon} \).
\begin{theorem}
	If \( E \) is a regular expression, \( \mathcal L(E) \) is essentially regular.
\end{theorem}
This is an equivalence, but the converse (often called Kleene's algorithm) is not required for this course.
\begin{proof}
	Observe that (i), (ii), (iii) are essentially regular languages, so it suffices to show that essentially regular languages are closed under (iv), (v), (vi), (vii).
	We have already shown that regular languages are closed under union and concatenation, and the proof for essentially regular languages follows easily.
	Note that \( \mathcal L(E^\star) = \mathcal L(E + E^+) \), so it suffices to show closure of regular languages under the Kleene plus; we can then perform case analysis to prove the same for essentially regular languages.

	Let \( G = (\Sigma, V, P, S) \) be a regular grammar.
	Let \( P^+ = P \cup \qty{A \to aS \mid A \to a \in P} \).
	It suffices to show that \( G^+ = (\Sigma, V, P^+, S) \) has the language \( \mathcal L(G^+) = \mathcal L(G)^+ \).

	Suppose \( w \in \mathcal L(G)^+ \), so \( w = w_0 \dots w_n \) for \( w_i \in \mathcal L(G) \).
	If \( n = 0 \), \( w \in \mathcal L(G) \) and any derivation can be translated easily into \( G^+ \).
	Otherwise, suppose \( w_0 \dots w_{n-1} \in \mathcal L(G^+) \) by induction.
	Therefore there is a derivation \( S \xrightarrow{G^+} w_0 \dots w_{n-1} \).
	This derivation ends with a terminal rule \( A \to a \), so we can replace it with a nonterminal rule \( A \to aS \), giving \( S \xrightarrow{G^+} w_0 \dots w_{n-1} S \xrightarrow{G} w_0 \dots w_{n-1} w_n \), so \( w \in \mathcal L(G) \) as required.

	Now suppose \( w \in \mathcal L(G^+) \).
	Without loss of generality we can assume that \( G \) is \( \varepsilon \)-adequate, so \( S \) does not occur on the right-hand side of a rule.
	Suppose we have a derivation \( S \xrightarrow{G^+} w \).
	Let \( n \) be the number of times that \( S \) occurs in the derivation.
	We then prove \( w \in \mathcal L(G)^+ \) by induction on \( n \).
	\( n \) cannot be zero.
	Suppose all words \( v \in \mathcal L(G^+) \) lie in \( \mathcal L(G)^+ \) if they have a derivation with \( n-1 \) occurences of \( S \).
	Since \( n \geq 1 \), we have \( S \xrightarrow{G^+} vS \xrightarrow{G^+} w \) where \( vS \) is the last occurence of \( S \) in the derivation of \( w \).
	In particular, \( S \xrightarrow{G^+} v \) with \( n - 1 \) occurences, since the last rule of \( S \xrightarrow{G^+} vS \) is one of the added rules in \( P^+ \).
	By induction, \( v \in \mathcal L(G)^+ \).
	Since \( vS \xrightarrow{G^+} w \), we know that \( w = vw' \) by considering the possible derivations in regular languages.
	Hence \( S \xrightarrow{G^+} w' \) with only one occurence of \( S \) at the start.
	In particular none of our new rules were used, so \( S \xrightarrow G w' \), so \( w' \in \mathcal L(G)^+ \), hence \( w \in \mathcal L(G)^+ \).
\end{proof}

\subsection{Minimisation of deterministic automata}
\begin{definition}
	A state \( q \) is called \emph{accessible} if there is a word \( w \) such that \( q = \hat \delta(q_0, w) \).
	A state that is not accessible is called \emph{inaccessible}.
\end{definition}
\begin{definition}
	States \( q \) and \( q' \) are \emph{distinguished} by a word \( w \) if \( \hat \delta(q, w) \in F \) and \( \hat \delta(q', w) \not\in F \), or vice versa.
	States that are distinguished by some word are called \emph{distinguishable}.
	States that are not distinguished by any word are called \emph{indistinguishable}.
\end{definition}
If \( f \colon Q \to Q' \) is a homomorphism, then
\begin{enumerate}
	\item if \( p, q \) are distinguishable, \( f(p) \neq f(q) \);
	\item if \( q' \in Q' \) is accessible, \( q' \) lies in the range of \( f \).
\end{enumerate}
In particular, if \( f \) is a homomorphism from \( D \) to \( D' \) and all pairs of nonequal states in \( D \) are distinguishable, \( f \) is injective; if all states in \( D' \) are accessible, \( f \) is surjective.
\begin{definition}
	An automaton \( D \) is called \emph{irreducible} if all pairs of nonequal states are distinguishable and all states are accessible.
\end{definition}
Hence, any homomorphism between irreducible automata is an isomorphism.

Defining \( q \sim q' \) if \( q \) and \( q' \) are indistinguishable, \( \sim \) is an equivalence relation.
As usual, we write \( [q] \) for the equivalence class of states indistinguishable from \( q \).
We can therefore define the \emph{quotient automaton} by
\[ \faktor{D}{\sim} = \qty(\Sigma, \faktor{Q}{\sim}, [\delta], [q_0], [F]);\quad [\delta]([q],a) = [\delta(q,a)];\quad [F] = \qty{[q] \mid q \in F} \]
Note that if an equivalence class contains an accept state, the class is completely contained in \( F \), so being an accept state is a class property.
The map \( [\delta] \) is well-defined: indeed, if \( q \sim q' \), we have \( \delta(q,a) \sim \delta(q',a) \), because if \( \delta(q,a) \not\sim \delta(q',a) \), there would exist a word \( w \) that distinguishes these two states, but then \( aw \) would distinguish \( q \) and \( q' \).

If \( q \not\sim q' \), we can show the two states are distinguished in the quotient automaton.
By induction, \( [\hat\delta]([q],w) = [\hat\delta(q,w)] \).
Suppose without loss of generality that \( \hat\delta(q,w) \in F \), \( \hat\delta(q',w) \not\in F \).
Then \( [\hat\delta]([q],w) \in [F] \), but \( [\hat\delta]([q'],w) \not\in [F] \).
So \( w \) distinguishes \( [q] \) and \( [q'] \).
In particular, each pair of nonequal states is distinguishable.

Note further that \( \mathcal L(D) = \mathcal L\qty(\faktor{D}{\sim}) \), because the quotient map \( q \mapsto [q] \) is a homomorphism.
If \( D \) had no inaccessible states, \( \faktor{D}{\sim} \) also has no inaccessible states, since the quotient map is surjective.
\begin{theorem}
	For every deterministic automaton, there is an irreducible deterministic automaton \( I \) such that \( \mathcal L(D) = \mathcal L(I) \).
\end{theorem}
\begin{proof}
	Let \( A \subseteq Q \) be the set of accessible states in \( D \).
	Let \( D^\star = \qty(\Sigma, A, \eval{\delta}_{A \times \Sigma}, q_0, F \cap A) \).
	The inclusion map from \( D^\star \) to \( D \) is a homomorphism, so their languages are the same.
	Now let \( I = \faktor{D^\star}{\sim} \).
	By the above discussion, \( I \) is irreducible and has the same language as \( D^\star \).
\end{proof}
\begin{remark}
	The number of states in \( I \) is at most the number of states in \( D \).
\end{remark}
\begin{theorem}
	If \( I, I' \) are irreducible deterministic automata and \( \mathcal L(I) = \mathcal L(I') \), then \( I \) and \( I' \) are isomorphic.
\end{theorem}
\begin{proof}
	It suffices to construct a homomorphism between the two automata, since any homomorphism between irreducible automata is an isomorphism.
	Let \( I = (\Sigma, Q, \delta, q_0, F) \) and \( I' = (\Sigma, Q', \delta', q_0', F') \), and without loss of generality let \( Q \cap Q' = \varnothing \).
	We can extend \( \sim \) to \( Q \cup Q' \), by defining \( q \sim q' \) if for all \( w \), \( \hat\delta(q,w) \in F \) if and only if \( \hat\delta'(q',w) \in F' \).
	We know \( q_0 \sim q_0' \), because by assumption, the languages of the two automata are the same.

	We show that for all \( q \in Q \), there exists \( q' \in Q' \) such that \( q \sim q' \).
	Let \( \mathrm{sp}(q) \) be the length of the shortest path from \( q_0 \) to \( q \).
	Since \( I \) is irreducible, this is well-defined and finite for all \( q \in Q \).
	We prove this claim by induction on \( \mathrm{sp}(q) \).
	The base case is \( \mathrm{sp}(q) = 0 \) so \( q = q_0 \), and we have already shown \( q_0 \sim q_0' \) as required.

	Now suppose \( \mathrm{sp}(q) = k + 1 \).
	Then there exists \( p \in Q \) and \( a \in \Sigma \) such that \( \delta(p,a) = q \) and \( \mathrm{sp}(p) = k \).
	By the induction hypothesis, we can find \( p' \in Q' \) such that \( p \sim p' \).
	Then let \( q' = \delta'(p',a) \), then \( q' \sim \delta(p,a) = q \).
	Hence each \( q \in Q \) has a \( q' \in Q' \) such that \( q \sim q' \).

	We now will show that if \( q' \sim q \sim p' \), we have \( q' = p' \).
	By transitivity, \( q' \sim p' \), but by irreducibility of \( I' \), \( q' = p' \).

	Because of the above results, we can construct a function \( f \colon Q \to Q' \) defined by \( q \mapsto q' \) where \( q \sim q' \).
	This is well-defined and unique.
	We now claim \( f \) is a homomorphism.
	Since \( q_0 \sim q_0' \), we have \( f(q_0) = q_0' \).
	The requirement \( q \in F \iff f(q) \in F' \) follows by definition of \( \sim \).
	Now fix \( q \in Q \) and \( q' = f(q) \), so \( q \sim q' \).
	Then, \( \delta(q,a) \sim \delta'(q',a) \), so \( f(\delta(q,a)) \sim \delta'(q',a) = \delta'(f(q),a) \).
\end{proof}
\begin{remark}
	There is a unique (up to isomorphism) irreducible automaton that accepts a given regular language, and its size is smaller than all other automata that accept the same language.
\end{remark}

\subsection{Equivalence problem}
We have already solved the word problem for noncontracting grammars and the emptiness problem for regular grammars.
To solve the equivalence problem, we will construct minimal automata for two given regular grammars, and check whether they are isomorphic; if so, the languages are the same, and otherwise, the languages are different.
We must check that this idea can be formulated into an algorithm which must complete in finitely many steps.
\begin{proposition}
	Let \( D \) be a deterministic automaton and \( q \in Q \) a state.
	Then it is solvable whether \( q \) is accessible.
\end{proposition}
\begin{proof}
	If there is a word \( w \) such that \( \hat\delta(q_0,w) = q \), then the shortest such word has length at most \( \abs{Q} \), which can be easily proven using the technique from the pumping lemma.
	We can explicitly check each word of length at most \( \abs{Q} \).
\end{proof}
\begin{theorem}[the table filling algorithm]
	Let \( D \) be a deterministic automaton and \( q, q' \in Q \) states.
	Then the proposition \( q \sim q' \) is solvable.
\end{theorem}
\begin{proof}
	Form a matrix \( A \) with entries indexed by \( Q \times Q \).
	The entry indexed by \( (q,q') \) contains information about distinguishability of \( q, q' \).
	In particular, \( A_{q,q'} \) contains either nothing or a word \( w \) distinguishing \( q \) and \( q' \).
	Since \( \sim \) is an equivalence relation, it suffices to consider the upper triangular part of the matrix, excluding the diagonal.
	To initialise the matrix, if \( q \in F \) and \( q' \not\in F \) we set \( A_{q,q'} = \varepsilon \), since the empty word distinguishes \( q, q' \).

	Then, for each \( q, q' \in Q \) that do not have a filled entry \( A_{q,q'} \) already, and for each \( a \in \Sigma \), we can check the entry indexed by \( (\delta(q,a), \delta(q',a)) \).
	If these two states are distinguished by a word \( w \), \( q \) and \( q' \) are distinguished by \( aw \).
	So we can set \( A_{q,q'} = aw \).
	This single step will terminate in a finite amount of time, on the order of \( \abs{Q}^2 \abs{\Sigma} \)-many steps.

	We then repeat this inductive step until no more assignments into the matrix can be made in an single iteration.
	This will happen in finitely many steps.

	We now must show that after this process completes, \( A_{q,q'} \) contains a word \( w \) if and only if \( q \) and \( q' \) are distinguishable, and in this case, \( w \) distinguishes \( q \) and \( q' \).
	If \( A_{q,q'} \) contains a word \( w \), it is clear that \( w \) distinguishes \( q \) and \( q' \), since \( \hat\delta(q,w) \in F \) and \( \hat\delta(q',w) \not\in F \) or vice versa.
	Now suppose there exists a word \( w \) that distinguishes some states \( q \) and \( q' \), but \( q, q' \) are unmarked in \( A \).
	Let \( q, q' \) be a pair of states with a distinguishing word \( w \) of minimal length.

	Either \( w = \varepsilon \) or \( w = av \).
	If \( w = \varepsilon \), \( q \in F \) and \( q' \not\in F \) or vice versa, so \( A_{q,q'} \) is marked.
	Otherwise, \( w = av \).
	Since \( v \) is shorter than the smallest word that distinguishes two states that are not marked in \( A \), we must have that the entry \( (\delta(q,a), \delta(q',a)) \) is marked with some word in \( A \).
	So at some step in the algorithm, this entry was added into \( A \).
	But then the algorithm would mark \( q, q' \) with a word in the next step.
\end{proof}
\begin{example}
	Consider the following automaton.
	\begin{center}
		\begin{tikzpicture}[shorten >=1pt, node distance=2cm, on grid, auto]
			\node[state, initial] (q_0) {\( q_0 \)};
			\node[state] (q_1) [right=of q_0] {\( q_1 \)};
			\node[state, accepting] (q_2) [below=of q_1] {\( q_2 \)};
			\node[state, accepting] (q_3) [below=of q_0] {\( q_3 \)};

			\path[->]
				(q_0) edge node {a} (q_1)
				(q_0) edge node [swap] {b} (q_3)
				(q_1) edge node {a} (q_2)
				(q_1) edge [loop above] node {b} ()
				(q_2) edge [loop below] node {a, b} ()
				(q_3) edge [loop below] node {a, b} ()
			;
		\end{tikzpicture}
	\end{center}
	In step zero, we find
	\begin{center}
		\begin{tabular}{c|cccc}
			& \( q_0 \) & \( q_1 \) & \( q_2 \) & \( q_3 \) \\\hline
			\( q_0 \) & & & \( \varepsilon \) & \( \varepsilon \) \\
			\( q_1 \) & & & \( \varepsilon \) & \( \varepsilon \) \\
			\( q_2 \) \\
			\( q_3 \)
		\end{tabular}
	\end{center}
	In step one, checking \( \delta(q_0,a) \) and \( \delta(q_1,a) \), we arrive at
	\begin{center}
		\begin{tabular}{c|cccc}
			& \( q_0 \) & \( q_1 \) & \( q_2 \) & \( q_3 \) \\\hline
			\( q_0 \) & & \( a \) & \( \varepsilon \) & \( \varepsilon \) \\
			\( q_1 \) & & & \( \varepsilon \) & \( \varepsilon \) \\
			\( q_2 \) \\
			\( q_3 \)
		\end{tabular}
	\end{center}
	The only remaining entry is \( (q_2, q_3) \), and this is not filled in a single step.
	Hence \( q_2 \sim q_3 \).
\end{example}
\begin{corollary}
	The equivalence problem for regular grammars is solvable.
\end{corollary}
Hence, for regular grammars, all of our desirable closure properties are true, and all of our motivating decision problems are solvable.
